Goto

Collaborating Authors

 amazon textract


Detect signatures on documents or images using the signatures feature in Amazon Textract

#artificialintelligence

Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. AnalyzeDocument Signatures is a feature within Amazon Textract that offers the ability to automatically detect signatures on any document. This can reduce the need for human review, custom code, or ML experience. In this post, we discuss the benefits of the AnalyzeDocument Signatures feature and how the AnalyzeDocument Signatures API helps detect signatures in documents. We also walk through how to use the feature through the Amazon Textract console and provide code examples to use the API and process the response with the Amazon Textract response parser library.


2022H2 Amazon Textract launch summary

#artificialintelligence

Documents are a primary tool for record keeping, communication, collaboration, and transactions across many industries, including financial, medical, legal, and real estate. The millions of mortgage applications and hundreds of millions of W2 tax forms processed each year are just a few examples of such documents. Critical business data remains unlocked in unstructured documents such as scanned images and PDFs, and trying to get humans to read this data or even legacy OCR is tedious, expensive, and error prone. This is why we launched Amazon Textract in 2019 to help you automate your tedious document processing workflows powered by AI. Amazon Textract automatically extracts printed text, handwriting, and data from any document.


Classifying and Extracting Mortgage Loan Data with Amazon Textract

#artificialintelligence

Mortgage loan applications, at least in the United States, comprise around 500 or more pages of diverse documents. In order for applications to be reviewed, all these documents need to be classified, and the data on each form extracted. This isn't as easy as it might sound! Besides different data structures in each document, the same data element may have different names on different documents--for example, SSN, or Social Security Number, or Tax ID. These three all refer to the same data.


Build a traceable, custom, multi-format document parsing pipeline with Amazon Textract

#artificialintelligence

Organizational forms serve as a primary business tool across industries--from financial services, to healthcare, and more. Consider, for example, tax filing forms in the tax management industry, where new forms come out each year with largely the same information. AWS customers across sectors need to process and store information in forms as part of their daily business practice. These forms often serve as a primary means for information to flow into an organization where technological means of data capture are impractical. In addition to using forms to capture information, over the years of offering Amazon Textract, we have observed that AWS customers frequently version their organizational forms based on structural changes made, fields added or changed, or other considerations such as a change of year or version of the form.


Announcing support for extracting data from identity documents using Amazon Textract

#artificialintelligence

Creating efficiencies in your business is at the top of your list. You want your employees to be more productive, have them focus on high impact tasks, or find ways to implement better processes to improve the outcomes to your customers. There are various ways to solve this problem, and more companies are turning to artificial intelligence (AI) and machine learning (ML) to help. In the financial services sector, there is the creation of new accounts online, or in healthcare there are new digital platforms to schedule and manage appointments, which require users to fill out forms. These can be error prone, time consuming, and certainly improved upon.


AMAZON MACHINE LEARNING

#artificialintelligence

What is Amazon Web Services? Amazon Web Services or AWS is world's broadly adopted cloud platform . AWS provides with a number of useful cloud computing services that are very much reliable, scalable and cost efficient as they say. AWS provides services like storage, networking, remote computing, servers, email, mobile development and security . So now coming to Amazon machine learning, frankly means leveraging ML algorithms on cloud platforms like AWS .


Announcing specialized support for extracting data from invoices and receipts using Amazon Textract

#artificialintelligence

The ExpenseIndex field uniquely identifies the expense, and associates the appropriate SummaryFields or LineItemGroups detected to that expense. The most granular level of data in the AnalyzeExpense response consists of Type, ValueDetection, and LabelDetection (optional). Let's call this set of data an AnalyzeExpense element. The preceding example illustrates an AnalyzeExpense element that contains Type, ValueDetection, and LabelDetection. In the preceding example, Amazon Textract detected 16 SummaryField key-value pairs, including VENDOR_NAME: New Store X1 and Order type:Quick Sale. AnalyzeExpense detects this key-value pair and displays it as shown in the preceding example.


How to Compare OCR Tools: Tesseract OCR vs Amazon Textract vs Azure OCR vs Google OCR

#artificialintelligence

Optical Character Recognition (OCR) tools are software able to detect and extract texts from images. They are used in the early steps of the analysis of scanned documents to recognize and automatically process the information that the documents contain. Depending on the complexity of the documents to be analyzed, OCR tools can be used to both detect and extract the texts from them or, in some pipelines, they are used only to extract the text from previously identified regions of interest, e.g. Paragraphs, Tables, Titles,… The latter case is of my particular interest. For these reasons the article will be focused on the extraction task.


zomato digitizes menus using Amazon Textract and Amazon SageMaker

#artificialintelligence

This post is co-written by Chiranjeev Ghai, ML Engineer at zomato. zomato is a global food-tech company based in India. Are you the kind of person who has very specific cravings? Maybe when the mood hits, you don’t want just any kind of Indian food—you want Chicken Chettinad with a side of paratha, and nothing […]


Translating PDF documents using Amazon Translate and Amazon Textract

#artificialintelligence

In 1993, the Portable Document Format or the PDF was born and released to the world. Since then, companies across various industries have been creating, scanning, and storing large volumes of documents in this digital format. These documents and the content within them are vital to supporting your business. Yet in many cases, the content is text-heavy and often written in a different language. This limits the flow of information and can directly influence your organization's business productivity and global expansion strategy.